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ABSTRACT 

Photometric rcdshift (photo-z) estimates are playing an increasingly important role in cxtragalac- 
tic astronomy and cosmology. Crucial to many photo-z applications is the accurate quantification 
of photometric redshift errors and their distributions, including identification of likely catastrophic 
failures in photo-z estimates. We consider several methods of estimating photo-z errors and propose 
new training-set based error estimators based on spectroscopic training set data. Using data from 
the Sloan Digital Sky Survey and simulations of the Dark Energy Survey as examples, we show that 
this method provides a robust, relatively unbiased estimate of photo-z errors. We show that culling 
objects with large, accurately estimated photo-z errors from a sample can reduce the incidence of 
catastrophic photo-z failures. 

Subject headings: galaxies: distances and redshifts — galaxies: photometry 



1. INTRODUCTION 

While spectroscopic redshifts have now been mea- 
sured for over one million galaxies, in recent years 
digital sky surveys have obtained multi-band imaging 
for over a hundred million galaxies. Deep, wide-area 
surveys planned for the next decade will increase the 
number of galaxies with multi-band photometry to a 
few billion. Over the last decade, substantial effort 
has gone into developing photometric redshift (photo- 
z) techniques, which use multi-ban d photometry to esti- 
mate approximate galaxy redsh i fts (ICqnnoUv et al.|[l995l: 
Bolzonella efaLl [20001: iBenited 120001: ICollister fc Lahavl 



20041 rWadadekarll2005f l. For many applications in extra- 



galactic astronomy and cosmology, the precision achieved 
by photometric redshifts is sufficient, provided one can 
accurately characterize the uncertainties in the photo- 
z estimates, i.e., the photo-z errors. A number of re- 
cent papers have considered the effects of photo-z er- 
rors on cos mological probes inc luding baryon acoustic 
oscillat ion (jZhan fc Kn ox 2006* ), weak len sing tomog- 
raphv ([Huterer e t al.l 12 006. : .Ma et al.l 120061). supcrnovae 
(iHuterer et al.l 20041 and galaxy clusters (jHuterer et al.l 
120041: lLima'fcIIuii2007t ). 

A number of methods have been proposed to char- 
acterize photometric redshift errors to date. They 
can be roughly divided into two categories: methods 
based on estimating statistical errors in template fit- 
ting, e.g., the Y^ method and its Bayes ian counter- 
parts (jBolzonella et al.ll200Ct lBeniteJl2000t) : and meth- 
ods that explicitly propagate errors in the input param- 
eters, typically m agnitudes or colors, through the photo- 
z estimator (e.g..lBrunner et al.|[T999t iHsieh et al.|[2005l : 
iCofiister fc Lahavll2004D . 

The error in a photometric redshift estimate Zphot is 
simply the difference between the photo-z esti- 
mate and the true (hereafter, spectroscopic) redshift, 
Az =Zphot— -Zspcc- In practice, the errors for the vast 



majority of objects in a deep photometric sample are 



unknown, since the spectroscopic redshifts are not mea- 
sured. Our goal is to devise an estimator of Az that 
has desirable statistical properties, e.g., minimum bias 
and variance, based on whatever information is at hand. 
Given a photo-z estimate, an error estimator should give 
the range of redshifts over which the true redshift will be 
found at some confidence level. 

In most cases, spectroscopic redshifts are available for 
a small subset of the photometric sample. Such spectro- 
scopic samples are often used as training sets for empiri- 
cal or machine-based learning photo-z estimators. In this 
paper, we develop methods of photo-z error estimation 
that are based on the use of spectroscopic training sets to 
accurately characterize the error distribution. We show 
that training-set based error estimators outperform other 
commonly used methods when a representative training 
set is available and that they are competitive even when 
the training set is not fully representative of the photo- 
metric sample. In cases where the magnitude errors are 
not well determined, we show that the relative advan- 
tages of the new training-set based methods are further 
increased. 

This paper is organized as follows. In fJ31 we describe 
the data sets that we use in this work. In ^ we intro- 
duce the training-set based error estimators and their 
implementations, as well as their advantages and dis- 
advantages. For comparison, we review the traditional 
error estimators in fJH and highlight the key differences 
between them and the training-set based error estima- 
tors. We show in ^Slthat the over-all photo-z scatter and 
outlier fraction can be significantly improved by culling 
objects with high estimated photo-z errors, possibly lead- 
ing to improved results in analyses that rely on photo-z's. 
Finally, we offer concluding remarks in fj6l 

2. TEST METHODS AND DATA 

In order to fairly compare the qualities of various pho- 
tometric redshift error estimators, we have compiled two 
galaxy photometric catalogs. Each catalog consists of 
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Fig. 1. — Photometric versus spectroscopic redshift for the 
DES mock catalog photometric set, calculated using the Neural 
Network [top panel) and Hyperz {bottom panel) methods. The 
dashed and dotted curves enclose 68% and 95% of the points in 
each Zspcc bin. In the lower right of each panel, a is the rnT,s 
photo-z scatter averaged over all N objects in the photometric set 
, (T^ = {l/N)T:^-^^(Az^)'^ , and o-gg is the range containing 68% of 
the validation set objects in the distribution of Az. The Hyperz 
photo-z's for the DES mock catalog are calculated with Zmax set 
to 2. 



spectroscopic redshifts, magnitudes in several chosen fil- 
ter passbands, and magnitude errors. 

The first catalog is a simulated data set created to 
resemble obser vations of the proposed Dark Energy 
Surve y (DES) (|The Dark Energy Survey Collaboration! 
|2005[) . The DES is a 5000 square degree survey in 5 opti- 
cal passbands (grizY) with a magnitude limit of i ~ 24, 
to be carried out using a new camera on the CTIO 4- 
meter telescope. The goal of the survey is to measure 



spec 

Fig. 2. — Photo-z versus spectroscopic redshift for the SDSS DR3 
photometric set calculated using the Neural Network {top panel) 
and Hyperz {bottom panel). The Hyperz photo-z's for the SDSS 
catalog are calculated with ^max set to 0.4. 



the equation of state of dark energy using several tech- 
niques: clusters, weak lensing, angular galaxy cluster- 
ing (baryon acoustic oscillations), and supernovae. Since 
DES will observe ^ 300 million galaxies, the redshifts 
must be obtained using photometric methods. The DES 
optical survey will be complemented in the near-infrared 
by the VISTA Hemisphere Survey, an ESQ Public Sur- 
vey on the VISTA 4-mcter telescope that will cover the 
survey area in J, H, and Ks. While the color information 
provided by grizYJHKs photometry leads to improved 
photo-z estimates compared to optical-only imaging, for 
simplicity and purposes of illustration the mock catalog 
we use here contains only griz magnitudes. 

The simulated DES catalog contains 200,000 galaxies 



with z < 2 and with 20 < i < 24. The magnitude 
and redshift distributions are derived from the galaxy 
lumi nosity fun c tion m easurements of iLin et all ()1999f ) 
and lPohet"aLl ()2003f) . while the galaxy SED type dis- 
tribution is obtai ned from measurements of the HP F- 
N/GOODS field (iCaBaketaLl [2003 : iWirth et all [2003 : 
ICowie et al.ll2004D. The galaxy colors are generated using 
the four IColeman et al.1 (|1980f ) templates-E, Sbc, Sod, 
Im-extende d to the UV and N I R usi ng synthetic tem- 
plates from iBruzual fc Charlotl ()1993f ) . To improve the 
sampling and coverage of color space, we create addi- 
tional templates by interpolating between adjacent tem- 
plates or by extrapolating from the E and Im templates. 

The second test catalog we use is based on 
the Slo an Digital Sky Surv ey (SDSS) Data Re- 
lease 3 (jAbazaiian et al.l l2003f ). Although this cat- 
alog has been superceded b y later data releases 
(| Adelman-McCarthv et al.ll2007f) . for which we have pub- 
hshed a photo-z catalog (jOvaizu et al.ll2007f ). it never- 
theless provides a useful testbed for studies of photo-z 
errors. This SDSS catalog contains spectroscopic red- 
shifts and magnitudes in ugriz passbands for 292, 964 
galaxies from the main spectroscopic sample, which is 
flux-limited to r < 17.77. Because this sample is con- 
fined to low redshift, z < 0.3, most of the strong features 
of galaxy spectra targeted by photometric redshift esti- 
mators fall within the wavelength range covered by the 
filters. A notable exception is the Lyman alpha emitters 
at z > 2.5. However, the fraction of these high redshift 
objects in our sample is too small to have measurable 
effects on our results. 

We calculate photometric redshifts for these catalogs 
using two methods, a Neural Network (NN) method 
and t he y^ based spectral te mplate fitting package Hy- 
pcrz (jBolzonella et al.l |200(I . The NN technique is a 
training-set method based on fitting a parametrized func- 
tion, represented by a feed-forward multilayer perccp- 
tron (FFMP) neural network, to the redshift-magnitude 
relation embodied in a spectroscopic training set. The 
implementation is the same as the one described in 
lOvaizu et all ()2007D for the SDSS DR6 photo-z catalog, 
except that the network configurations are different: here 
we use a 4:15:15:15:1 network for the DES catalog and 
a 5:15:15:15:1 network for the SDSS catalog. Figures [1] 
and [5] show the resulting photometric redshifts plotted 
against spectroscopic redshifts for all catalogs used in 
this study. 

We split the DES and SDSS catalogs into three inde- 
pendent catalogs each, labeled training, validation, and 
photometric sets. The sizes of these sets are 50,000, 
50,000, and 100,000 for the DES and 100,000, 92,964, and 
100,000 for the SDSS. Except where noted below {MM, 
these subsets are drawn at random from the photomet- 
ric samples, i.e., they are each statistically representative 
of the full samples. When the photo-z's are determined 
using the NN training-set method, we use the training 
and validation sets to determine the mapping from mag- 
nitudes to redshifts and magnitudes to redshift errors. 
The resulting mapping is then applied to the photomet- 
ric set for comparison of the training-set error estimator 
against other error estimation methods. Splitting the 
catalogs ensures that the training-set error methods are 
not given unfair advantage with respect to the other er- 
ror estimators. When we estimate photo-z's and photo-z 



errors using template methods, we apply the methods 
directly to the photometric set. 

3. ERROR ESTIMATES USING TRAINING SETS 

Train ing set b a sed p hoto-z es tir nators 
Ce.g.. iConnoUv et all [l995l : iCsabai et all [20031 : 
ICollister fc Lahavl I2004D use a spectroscopic train- 
ing set, typically a subset of the photometric sample, 
to derive a functional relation between redshift and 
photometric observables (e.g., magnitudes) which is then 
applied to the photometric sample of interest. In the 
same spirit, we can also use a training set to derive an 
estimate of the photo-z error, that is, a relation between 
photo-z error and some photometric observables. Note 
that the error estimator does not need to make use of 
the same observables as the photo-z estimator. In fact, 
we stress that the empirical photo-z error estimators are 
independent of the method used to estimate photometric 
redshifts themselves: training-set based error estimators 
can be applied to either empirical (training set) or 
template-based photo-z estimates. The assumption 
underlying the training-set based error estimator is that 
there is a functional relationship between some set of 
photometric observables and photo-z error and that 
this relationship for the training set data is reasonably 
representative of the relationship for the photometric 
sample as a whole. 

In the following subsections, we describe and test two 
basic techniques that use a spectroscopic training set to 
estimate photo-z errors. Both techniques are based on 
the simple observation that objects with similar magni- 
tudes in a photometric survey tend to have similar pho- 
tometric errors, and such magnitude errors are typically 
the largest contributors to photometric redshift error. 
Therefore, objects with similar multi-band magnitudes 
will tend to have similar photo-z errors. Moreover, such 
neighbors in magnitude space, having similar colors, usu- 
ally (but not always) correspond to galaxies with simi- 
lar SEDs. Photo-z errors depend strongly on SED type, 
since the quality of photo-z estimates is related to the 
presence of strong and broad spectral features. We can 
therefore group objects in a spectroscopic training set ac- 
cording to their magnitudes and determine the photo-z 
error as a function of the magnitudes using the train- 
ing set. For each object in the photometric set, we then 
find the objects in the training set that are near it in 
magnitude-space and associate some weighted mean of 
the measured errors for these training-set neighbors to it. 
The two methods introduced below differ in the method 
of grouping the galaxies. 

3.1. Kd-tree Error Estimator 

The first photo-z error method we consider uses a Kd- 
tree algorithm to bin training-set objects in magnitude 
space. A Kd-tree (short for K-dimcnsional tree) is a gen- 
eral data organization and classification algorithm that 
is suited for efficiently partitioning data points in multi- 
dimensional parameter spaces. In our implementation, 
the training set is partitioned into two bins at the me- 
dian value of the first photometric parameter (which we 
choose to be u mag for SDSS, g for DES). For each bin, 
the objects within the bin are further partitioned at the 
median of the second parameter (here g for SDSS, r for 
DES), resulting in 2^ — 4 bins. This process is continued 
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Fig. 3. — Upper panels: Estimated error vs empirical error for a) DES photometric set using Kd-tree error estimate, b) SDSS using 
Kd-tree, c) DES using Nearest Neighbor error (NNE) estimate, and d) SDSS using NNE. In all four cases, the neural network method was 
used to estimate the photo-z's. Lower panels: Corresponding distributions of (zphot ~ 2spec)/a'est, where crest is the photo-z error estimate 
for each galaxy. Solid histograms show the distributions, dashed curves are Gaussian fits to the distributions. 



for the photometric parameters of interest (here the 5 
magnitudes for SDSS and 4 for DES). Wc then return to 
the first parameter, partition each bin at the median of 
the first parameter for that bin, cycle again through the 
parameters, and continue subdividing until the number 
of objects in a bin becomes sufficiently small. Once the 
partitioning is completed, we calculate the 68% width of 
the error distribution centered about Zphot — ■^spec = 
for each bin and declare that to be the photo-z error 
estimate for objects in the photometric sample that fall 
within that bin. 

Because the Kd-tree bins arc always partitioned at the 
median value of the object distribution in some param- 
eter, the number of training-set objects per bin, TV;,, is 
nearly constant from bin to bin. This constancy ensures 
a nearly uniform shot-noise uncertainty (ex l/^/Nf,) on 
the estimates of the photo-z errors. While this statisti- 
cal uncertainty is minimized by having many objects per 
bin, large bins are "non-local" in multi-magnitude space, 
and the training-set based error estimator is predicated 
on the locality assumption that similar magnitudes im- 
ply similar errors. Therefore, the optimal bin size should 
be as small as possible (or smaller than the scale over 
which the error distribution changes appreciably) but 
large enough that the shot-noise error is not large com- 
pared to the error induced by non-locality of the bin. For 
the training set samples we consider here, we find that 
Nh ~ 100 objects per bin is nearly optimal. The size of 
the training set can also change the locality of the near- 
est Nb neighbors, and in general, the required locality 
depends on the first derivative of the redshift-magnitude 
relationship. Because such relationships are dependent 
on numerous factors, such as filter choice, selection func- 
tion, and magnitude errors, we cannot provide a general 
requirement for the training set size. We note, however, 
that in both DES mock and SDSS catalogs, we find vir- 
tually no improvement in error estimator quality when 



the training set size is larger than 20,000 galaxies. 

Figures [3^ and[3jD show the results of applying the Kd- 
tree error estimator to the DES and SDSS photometric 
sets. In these cases, the neural network (NN) method 
was used for the photo-z estimates. The photo-z errors 
are estimated using a Kd-tree with 512 bins for the DES 
catalog and 1024 bins for the SDSS catalog, correspond- 
ing to Nb ~ 97 training-set objects per bin in each case. 
The top panels of Figure [3] show the photo-z error esti- 
mates vs. the measured or "empirical" errors. In order 
to compute the empirical error, we first sort the galax- 
ies according to their estimated error. Next, wc bin the 
galaxies into bins of 100 objects starting from the galaxy 
with the smallest estimated error, and call the average 
estimated error of the galaxies within a bin the "esti- 
mated error" of the bin, which is plotted on the vertical 
axis of Figure [H Finally, we compute the 68% width of 
the I Zphot — ^spocl/ccst distrubtion of each bin, and call 
it the "empirical error" of the bin. The assumption here 
is that if the error estimator is working properly, those 
objects with similar estimated error should follow similar 
underlying error distributions, and the underlying distri- 
bution should have a width that is well-approximated by 
the estimated error. As the figure shows, the estimated 
Kd-tree error correlates well with the true error, with 
almost no apparent bias and relatively small scatter. 

The solid histograms in the lower panels of Fig- 
ure [3] show the corresponding distributions of (zphot ~ 
■Zspec)/cest, whcrc ffest IS the Kd-tree error estimate. The 
dashed curves in these panels show Gaussian fits to the 
error distributions; we also indicate the best-fit Gaus- 
sian means (/icauss) and standard deviations ((TQauss) as 
well as the ergs widths (about zero) of the distributions 
(not the fits). The fits give equal weight to each bin of 
the distributions and ignore objects for which Cost — 0. 
There is no a priori reason for these error distributions 
to be Gaussian. Nevertheless, for the Kd-tree error es- 



timator, the error distributions are very close to Gaus- 
sians, except for small tails seen for both the DES and 
SDSS catalogs. The tails are signatures of catastrophic 
photo-z failures: due to photometric errors, an intrinsi- 
cally under luminous, red galaxy at low redshift, for ex- 
ample, may "scatter into" a bin mostly populated (in 
the training set) by intrinsically luminous, blue galaxies 
at much higher redshift. In such degenerate cases, the 
photo-z error is large, and the Kd-tree error underesti- 
mates the true error: in this example, the Kd-tree error 
assigned to the red galaxy interloper would be dominated 
by the small errors of the blue galaxies in that bin. With 
a sufficiently large training set, one could hope to iden- 
tify such problematic bins in magnitude space, since the 
photo-z error distributions in the training set for those 
bins would show anomalous tails. 

A disadvantage of the Kd-tree method is the fact that 
the estimated error is discrete. There can only be as 
many different error estimates as there are Kd-tree bins, 
and this limits the resolution of the estimated photo-z 
errors, especially for objects with large photo-z errors 
as seen by the lack of high Kd-tree estimated errors in 
Figure [3l This problem can in principle be alleviated 
by using more Kd-tree bins. However, as noted above, 
for fixed training set size, the number of bins is limited 
by the requirement that each bin should contain enough 
training-set objects to determine the error with small 
shot-noise uncertainty. 

3.2. Nearest Neighbor Error estimator 

While the Kd-tree error estimator was seen to have 
good statistical properties, we have found that a Nearest 
Neighbor Error (NNE) estimator performs even better. 
Note that the NNE has in principle nothing to do with 
Neural Networks (NN), and the readers should be care- 
ful not to confuse the similar acronyms. In this method, 
for each object in the photometric set, we estimate the 
photo-z error by using the 68% spread of the error dis- 
tribution of its A^nei nearest neighbors in the training 
set. Here, nearness in magnitude space is defined using 
the Euclidean metric: given two objects with two sets of 
measured magnitudes rtii and m2, we define the distance 
between them by 
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where Nm denotes the number of magnitudes (different 
passbands) measured for each object. In contrast to the 
Kd-tree method, in NNE each object in the photometric 
set defines its own "bin." 

The choice of the number of nearest neighbors (iVnci) 
to use is analogous to the choice of the number of bins in 
the Kd-tree error estimate. We prefer to keep the number 
of neighbors constant for all objects in the photometric 
set, since the shot noise of the resulting error estimate 
is then fixed. As with the Kd-tree method, one should 
choose Nnei large enough to keep the shot noise of the 
estimate under control but small enough so that the error 
estimate remains relatively local in magnitude space. For 
the samples we have tested in this analysis, we again find 
that Nnei — 100 training-set neighbors is nearly optimal. 

The upper panels of Figures [3}; and [3Ji show the re- 
sults of applying the NNE estimation method to the DES 
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Fig. 4. — Left: Estimated Error vs. empirical error for NNE 
applied to the DES catalog with Hyperz photo-z's. Right: Error 
distribution for the same data. 

and SDSS catalogs. The discreteness that was a con- 
cern for the Kd-tree error estimate is not present in the 
NNE method. Moreover, the NNE error displays tighter 
correlation with the empirical error, because a nearest- 
neighbor bin for a photometric object is almost always 
more local in magnitude space than a Kd-tree bin for 
the same object. The lower panels of the same Figures 
show that the error distributions are reasonably well fit 
by Gaussians, with widths that are within 5% of the 
expected width crcauss = 1- Non-Gaussian tails similar 
to those seen in the Kd-tree error distributions are also 
present in the NNE error distributions, for the same rea- 
sons. 

As noted above, the NNE and the Kd-tree error meth- 
ods can be used in conjunction with any photo-z esti- 
mator, either training-set or template-based, provided 
there exists a subset of the photometric sample with 
spectroscopic redshifts. As an illustration, we use the 
Hyperz template fitting method to calculate photomet- 
ric redshifts for the full DES mock catalog (shown in 
the lower panel of Fig. [1]). We then use 50,000 objects 
from the DES catalog as a training set for NNE and 
calculate photo-z errors for the remaining photometric 
objects. Figure 0] shows the estimated vs. empirical er- 
ror (left panel) and the error distribution (right panel) 
for this example. The NNE error estimate works well, 
though as before it results in an underestimate when the 
errors are very large (Az > 0.25). The error distribution 
is not as well fit by a Gaussian in this case; this is not 
surprising, since the photo-z estimate in this case has a 
net bias of ^ 23%. However, the error estimator is able 
to account for the bias and is still able to predict the 
error to within 12% in CTcauss- This ability to include 
the bias in the error estimates makes the training set 
error estimate approach particularly powerful compared 
to methods based on magnitude error propagation (see 
S21). 

In our implementation of the NNE, computation of the 
NNE is expensive compared to the Kd-tree method. In 
the naive implementation, computation time to find the 
nearest objects scales as NtNp, where A^t and A^p are 
the number of objects in the train ing set and the ph oto- 
metric set, respectively (see, e.g.. iPress et al.lfi992t ). In 
contrast, the Kd-tree method scales as A^plogA^T- For 
most training-set photo-z methods, including the Neural 
Network, the computation time scales as A^p. There- 
fore, for a sizeable training set (A't ~ 10,000 objects), 
the NNE computation dominates the time involved in 
estimating the photo-z's and their errors. Fortunately, 
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the method is trivially parallclizable, because the NNE 
calculation of one object in the photometric set is inde- 
pendent of all the other objects in the same set. Taking 
advantage of this parallelization, the NNE estimator has 
been succes sfully applied to a data set as l arge as the 
SDSS DR6 (jAdelman-McCarthv et aD 120071 ) containing 
mo re than 78 million g alaxies with Neural Net photo- 
z's. ()Ovaizu et al.l l2007t ). In addition, tree-structured 
nearest neighbor search m ethods, such as the Cover-Tree 
(jBavgelzimer et all l2006| ) . can be used to improve the 
computation time to 0{Np log A'^t), essentially eliminat- 
ing the difference between the Kd-tree and NNE meth- 
ods. 

3.3. Non-representative training set 

The training-set based error estimators we have intro- 
duced rely on the spectroscopic training set to character- 
ize the errors of the photometric set. Hence, the quality 
of the error estimate depends in principle on the degree 
to which the training set is a representative subsample of 
the photometric set. Since spectroscopic samples often 
are not simply random subsets of the parent photometric 
samples from which they are drawn, one might have con- 
cerns about the robustness of these error estimates. Here, 
we consider cases of non-representative training sets and 
show that the training-set error estimators perform sat- 
isfactorily provided the training set covers the full mag- 
nitude range of the photometric sample. 

In order to illustrate the issue, we have constructed 
two non-representative training sets using the DES cata- 
log generator. One training set (labelled Flat) has a flat 
i-magnitude distribution at i < 24 instead of the increas- 
ing distribution characteristic of a flux-limited sample; 
bright (faint) objects are over- (under-)represented com- 
pared to the photometric sample. The second training 
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Fig. 6. — Top rouj; NNE error vs empirical error calculated using 
two non-representative training sets. Bottom row: Error residual 
distributions. 



set (labelled Extr) has an i-magnitude distribution highly 
skewed to bright magnitudes, i < 22, since a spectro- 
scopic set typically does not go as faint as the correspond- 
ing photometric sample. Both training sets have flat red- 
shift and SED type distributions, differing from those of 
the fiducial DES mock catalog. The i-magnitude distri- 
butions, as well as the Zphot vs Zgpec plots, are shown in 
Figure[5l Each training set contains 50,000 galaxies. We 
used the training sets to derive Neural Network photo-z 
solutions, which were then used to estimate photo-z's for 
the DES mock photometric catalog. Photo-z errors were 
estimated using the NNE method, again using the same 
non-representative training sets in each case. In Fig. [6l 
we show the estimated vs. empirical error (top panels) 
and the error distributions (bottom panels) for the two 
cases. We see that the NNE error method estimates the 
errors correctly at the ^ 10% level while maintaining 
Gaussianity in both cases. In the case of the Flat train- 
ing set, the error accuracy degradation is less than 1% 
compared to the representative training case. Given the 
fact that the Neural Network photo-z quality is itself de- 
graded by ^ 10% compared to the representative case in 
scatter, these results show that the NNE error estimator 
is robust against differing distributions of the training 
and photometric sets. 

A possible approach to the issue of non-representative 
training sets would be to resample or weight the training- 
set objects to obtain a distribution that matches the dis- 
tribution of photometric observables (magnitudes, colors, 
etc.) of the photometric sample. In the case of the DES 
catalog and the two non-representative training sets used 
above, this resampling results in a marginal improvement 
in the error estimate at the ~ 2% level in both a^g and 
fGauss- We plan to offer further discussions and test 
results in subsequent articles, curre ntly in preparation 
(jLima et al.ll2008l : ICunha et al.ll2008[) . 

4. COMPARISON WITH OTHER ERROR ESTIMATORS 
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Other photo-z error estimators have been proposed 
in the hterature. Two commonly used estimators are 
the Y ^ error in template fitt ing methods, such as Hy- 
perz (jBolzonella et al.l l2000t) . and the propagation of 
magnitude errors that i s found in, for example, ANNz 
(|Collister fc Lahavll2004f l. In this section, we discuss the 
performance of these error estimators and consider the 
advantages and disadvantages of our training-set based 
error estimators compared to these methods. 

4.1. x^ error estimate 
Template-fitting photo-z methods often use x^ min- 



imization to determine the best-fit Zphot 
type. The quantity to be minimized is 



X 



E 



-^obs 



'^ ■ -^tomp(^) 



and spectral 



(2) 



where F^^^^ is the observed flux in passband fc, dp is the 
corresponding uncertainty in the flux, FI'^ (z) is the flux 
of a template SED redshifted to a given z, a is a normal- 
ization factor, and A^m is the number of passbands in 
which measurements are available. This statistic is min- 
imized over redshift and over the set of template SEDs. 
When a model being fit to data is linear in the fit pa- 
rameters, the probability distribution for the x^ statistic 
is the chi-square pr obability distribut ion for v degrees 
of freedom, P{x^\i^) (jPress et al.lll993) . Given the value 
of x^ = Xmin that minimizes Eq. [2l the corresponding 
P(Xminl^) gives the probability that the observed x^ for 
a correct model should be less than Xmin- This probabil- 
ity can be used to calculate redshift confidence intervals. 
Given a confidence level a (0 < a < 1), define the quan- 
tity A^2 such that ()Avnilll976f) 



P{x^<A^2\u) 



(3) 



The level-a Zphot confidence interval is given by the set 
of all redshifts for which 



X'iz) 



Amin 



< A. 



(4) 



where x'^i^) is minimized over spectral type and the co- 
efficient a. That is, A^2 is simply the increment in x^ 
required to cover the region of parameter space with red- 
shift confidence a. Here, we are interested in comparing 
the 68% confidence interval of the photometric redshift, 
so we set the parameter a — 0.68. 



In Figure[7l we show the x^-estimated error vs. empir- 
ical error and the residual error distribution for the Hy- 
perz photo-z estimator applied to the DES mock catalog. 
The x^ error underestimates the true error by about a 
factor of two. Furthermore, the distribution of the er- 
ror residual divided by the estimated error is decidedly 
non-Gaussian, exhibiting strong tails. We attribute the 
underestimate to the fact that the chi-square distribu- 
tion is not a realistic description of the true photo-z er- 
ror distribution, given the relatively strong degeneracies 
present in the catalog. In a test using an artificial mock 
catalog containing only early- type galaxies, in which the 
degeneracy between redshift and galaxy SED type is re- 
moved, we found that the x^ estimator was accurate at 
the ~ 30% level. In addition, the model used in the x^ er- 
ror estimator assumes that the fitting function, F^^ (z), 
is linear in the fitting parameters, namely the redshift. 
In reality, the template fitting functions are highly non- 
linear, and therefore it is not surprising that the x^ error 
estimator does not robustly predict the correct errors. 

We also tried to compute x^ errors for Hyperz applied 
to the SDSS catalog, but we were not able to obtain 
sensible estimates. We found no discernible correlation 
between the x^ errors and the true errors of the photo- 
z estimate. We discuss this issue further at the end of 
section 14.21 



4.2. Error estimate from. Magnitude Derivative (MDE) 

The basic assumption underlying photo-z estimates is 
that there is a one-to-one mapping from photometric ob- 
servables, e.g., magnitudes, to redshift. In training-set 
photo-z methods, this mapping is given by an explicit, 
usually analytic, function of magnitudes m'^ and fit co- 
efficients Ck, 

■Zphot = 2phot(ck,nT-'^) , (5) 

where the Ck are determined from the spectroscopic train- 
ing set by mininfizing a score function, a measure of the 
error residuals of the photo-z estimates. To first order, 
we can propagate the coefficient errors ac^. and the mag- 
nitude errors am to the photo-z errors a^ as 



a' = 



E 

k=l 






dck 



If the training set is sufficiently large (say, ~ 10, 000 
objects), the photo-z errors due to errors in the model 
fit coefficients are typically negligible compared to those 



arising from magnitude error propagation. Therefore we 
will concentrate on the latter and define the Magnitude 
Derivative error (MDE) as the second term in Eqn. [S] 
(jCollister fc Lahavll2004l ) . For polynomial fitting and NN 
photo-z methods, analytic expressions for the derivatives 
(see, e.g.. Bishop (1995 ) for the case of NN) can be used. 
However, we may also calculate these derivatives by fi- 
nite difference, in which case MDE can be applied to any 
photo-z estimation method, including template fits. 

Figure [5] shows the performance of the MDE error cal- 
culation for the DES mock catalog using neural network 
photo-z's. MDE errors underestimate the true error by 
approximately 40% for this case. Although the error 
residuals are nearly Gaussian, the tails of the error distri- 
bution are more pronounced than the tails for the NNE 
error, signaling the failure of MDE to correctly identify 
c atastrophic photo-z errors . 

ICollister fc Lahavl ()2004f ) identify a second source of 
error in neural network photo-z's: in the training pro- 
cess, the score function typically has many local minima 
with similar values. As a result, networks that start the 
minimization process at different initial values for the fit 
coefficients can end up in different local minima, resulting 
in slightly different photo-z estimates for the same input 
magnitudes. The variance in photo-z estimates due to 
this effect is an additional contribution to the photo-z 
error. By retraining our networks with different initial 
conditions, we find that the contribution of such an effect 
to the photo-z error is small (< 1% of MDE) for our two 
catalogs, not enough to account for the underestimate of 
the MDE errors when applied to neural network photo-z 
estimates. 

The x^ and MDE error estimators are both predicated 
on the accuracy of the quoted magnitude errors. How- 
ever, photometr ic errors are often di fficult to estimate 
accurately (e.g.. IScranton et"aI1l2005( l. The problem is 
further exacerbated if the magnitude errors in different 
passbands are correlated with each other, thereby violat- 
ing the assumptions made in the x^ fit and in magnitude 
error propagation. Because of these difficulties, the MDE 
errors applied to the NN photo-z estimates for the SDSS 
catalog are only weakly correlated with the true errors, 
similar to the case of x^ error applied to the SDSS. A 
key advantage of the training-set based error estimators 
is that they do not depend on the measured magnitude 
errors. 

5. REDUCING CATASTROPHIC OUTLIERS: CULLING 
OBJECTS BY ESTIMATED ERROR 

In certain analyses, one would like to remove objects 
with very erroneous, so called catastrophic, photo-z esti- 
mates from a sample. If the estimated photo-z errors arc 
reliable, then objects with large estimated errors can be 
used to identify catastrophic photo-z failures. Removing 
such objects from a sample can reduce the scatter and 
bias in photo-z estimates. 

In this study, we define objects with catastrophic errors 



as those for which | Zphot ■ 



is large compared to the 



photo-z scatter, a. Specifically, we define catastrophic 



errors to be 
and |zpho 



^phot 



■^spccl 



> 0.3 for the DES catalog 



> 0.05 for the SDSS, corresponding to 
approximately 2.5 times the scatter for the NN photo-z 
estimate for each survey. We define the outlier fraction to 
be the fraction of objects in a photometric sample with 
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Fig. 9. — Top: Reduction in photo-z scatter a when objects with 
large estimated photo-z errors are culled from the sample, using 
two photo-z estimators, NN and Hyporz, and four error estimators, 
NNE, Kd-tree, MDE, and x^ ■ Horizontal axis is the the fraction of 
objects culled from the DES catalog. Bottom: Reduction in outlier 
fraction when objects arc culled by estimated photo-z error. For 
the DES catalog, the outlier fraction is defined as the fraction of 
objects with l^phot - 2!spcc| > 0.3. 



catastrophic errors. We sort the photometric catalogs 
by the galaxies' estimated photo-z errors and track the 
changes in a and in the outlier fraction as we successively 
remove objects with smaller and smaller estimated error. 
In Figure [HI we show the dependence of the photo- 
z scatter, tr, and the outlier fraction on the fraction of 
objects culled from the sample based on the estimated 
error. We show results for the four different error estima- 
tors described above (Kd-tree, NNE, x^, and MDE) for 
the DES mock catalog. We clearly see that the NNE and 
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Fig. 10. — Same as Fig. [9] but for the SDSS catalog. Here the 
outlier fraction is the fraction of objects with Izphot ~-2spcc| > 0.05. 



the MDE estimators perform the best in reducing scat- 
ter and outliers, while the x^ method fails to adequately 
separate catastrophic photo-z's from the well behaved 
ones. Note that the relatively poor performance of the 
X^ method is not due to the fact that the Hyperz photo-z 
scatter is larger: the NNE error estimate with the Hyperz 
photo-z performs significantly better. 

Figure [10] shows the photo-z scatter and outlier frac- 
tion for the SDSS catalog. For this case, MDE and x^ 
do not perform as well in reducing scatter and outliers. 
These error estimators rely on the reported magnitude 
errors, and as noted above the latter are highly corre- 
lated between passbands and are non-Gaussian for the 
SDSS catalog. In fact, culling objects with high x^ error 
results in no improvement of the scatter, a reflection of 
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the fact that the x^ error for the SDSS catalog is not 
correlated with the actual error of the Hyperz photo-z 
estimates. 

Figure [Tl] shows Zphot vs. Zspcc for the DES catalog 
with NN photo-z's when 10% of the objects, those with 
the largest estimated NNE errors, have been removed. 
Comparing to the results in the upper panel of Fig. [1] 
this process reduces the photo-z scatter in the remaining 
objects by ~ 23%. Moreover, most of the catastrophic 
objects at low redshift are removed, improving the bias 
and the scatter at those redshifts. 

This procedure of removing catastrophic objects 
changes the selection function of the sample, which in 
turn changes the redshift distribution. When culling a 
catalog using an estimated error, one should carefully 
consider the effects of the reduced sample size as well 
as the change in the selection function of the objects to 
be analyzed. Recently, there has been promising work 
showing that, for the DES mock catalog, the accuracy 
of galaxy power spectrum measurement can be improved 
by culling high estimated error galaxies using the MDE 
estimator (Banerji et al., in prep). The study finds that 
the improvement in the photo-z scatter outweighs the 
reduced statistics of the resulting smaller sample of low 
photo-z error galaxies. 

6. CONCLUSIONS 

In this paper, we have introduced a new approach to 
estimating photometric redshift errors using a spectro- 
scopic training set. We presented two implementations 
of the training-set approach, Kd-tree and Nearest Neigh- 
bor Error (NNE), and found that NNE is the best er- 
ror estimator when a representative training set is avail- 
able. Compared to the x^ error and the MDE estima- 
tors, training-set based error estimators are less sensitive 
to systematic errors in magnitude error estimates. They 
incorporate both the bias and scatter of the photo-z's. 



10 



important features given the often substantial biases in 
photo-z estimates. Comparison of NNE and Kd-tree er- 
rors with error estimators from the Hterature shows that 
these training-set error estimators are in general more 
accurate and better behaved (in the sense that the error 
residual distribution is closer to a Gaussian). 

Since a fully representative spectroscopic training set 
is not always available, we explored the impact on these 
error estimates of non-representative training sets. We 
found that this does not substantially degrade the ac- 
curacy of the training-set error estimates. In fact, we 
showed that even for training sets with very different 
magnitude and redshift distributions from the photomet- 
ric sample, the training-set error estimates remain accu- 
rate at the 10% level. 

Finally, we demonstrated that one can cull galaxies 
with large estimated errors from a sample and thereby 
significantly improve the overall scatter and bias of the 
photo-z estimates. Because the training-set error estima- 
tors are more accurate than other error estimators, and 
because the photo-z error residuals are nearly Gaussian 
distributed for these methods, culhng objects using NNE 
or Kd-tree results in greater performance improvement 
than culling with other error estimators. 
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